feat: add executable Tier 2 Agent Teams patterns#195
Conversation
…xsim-batch SKILL.md Replace prose descriptions of Tier 2 competitive implementation with concrete TeamCreate/SendMessage call syntax in execute.md section 6.3. Add three complete Tier 2 workflow patterns to maxsim-batch SKILL.md: competitive implementation (debate), multi-reviewer code review (cross-checking), and collaborative debugging (adversarial hypothesis testing). Each pattern includes TeamCreate, teammate spawn, SendMessage exchange, verifier resolution, and Tier 1 graceful degradation fallback. Removes "planned but not yet implemented" disclaimer. Addresses PROJECT.md §7.2 audit gap (Parallelism PARTIAL 1-3). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
🎉 This PR is included in version 5.15.0 🎉 The release is available on: Your semantic-release bot 📦🚀 |
There was a problem hiding this comment.
Pull request overview
This PR upgrades Tier 2 Agent Teams guidance from prose to concrete, copy/pasteable workflow patterns so orchestrators can apply TeamCreate/SendMessage-based collaboration (with Tier 1 fallbacks) during execution, reviews, and debugging.
Changes:
- Replaces the Tier 2 “debate” description in
execute.mdwith a step-by-step TeamCreate + teammate spawn + SendMessage critique + verifier selection flow (with Tier 1 fallback). - Adds three complete Tier 2 patterns to
maxsim-batch/SKILL.md(competitive implementation, multi-reviewer cross-checking, collaborative debugging), including activation checks and per-pattern Tier 1 degradations. - Removes the “planned but not yet implemented” disclaimer from
SKILL.md.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| templates/workflows/execute.md | Adds concrete Tier 2 competitive debate steps (TeamCreate/SendMessage/verifier) and explicit Tier 1 fallback path. |
| templates/skills/maxsim-batch/SKILL.md | Documents executable Tier 2 Agent Teams patterns + activation check + strengthened graceful-degradation guidance. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| SendMessage({ | ||
| type: "message", | ||
| recipient: "competitor-b", | ||
| content: "Review competitor-a's implementation. Identify weaknesses, edge cases missed, and potential issues. Be adversarial -- find real problems, not style preferences. Report: (1) correctness issues, (2) missing edge cases, (3) maintainability concerns.", | ||
| summary: "Requesting adversarial review of competitor-a's work" | ||
| }) | ||
|
|
||
| SendMessage({ | ||
| type: "message", | ||
| recipient: "competitor-a", | ||
| content: "Review competitor-b's implementation. Identify weaknesses, edge cases missed, and potential issues. Be adversarial -- find real problems, not style preferences. Report: (1) correctness issues, (2) missing edge cases, (3) maintainability concerns.", | ||
| summary: "Requesting adversarial review of competitor-b's work" | ||
| }) | ||
| ``` | ||
| Each teammate responds with a structured critique. This fights LLM anchoring bias -- the first plausible answer does not automatically win. | ||
|
|
||
| **Step 2d -- Verifier selects winner:** | ||
| Spawn a fresh verifier agent (NOT a team member) to evaluate both implementations and both critiques: | ||
| ``` | ||
| Agent( | ||
| subagent_type: "verifier", | ||
| model: "{verifier_model}", | ||
| prompt: " | ||
| You are judging a competitive implementation. Two (or three) agents each implemented the same task independently, then reviewed each other's work adversarially. | ||
|
|
||
| ## Implementations | ||
| - competitor-a (CONSERVATIVE): {summary or path to worktree-a} | ||
| - competitor-b (INNOVATIVE): {summary or path to worktree-b} | ||
|
|
||
| ## Critiques | ||
| - competitor-b's critique of competitor-a: {critique-b-of-a} | ||
| - competitor-a's critique of competitor-b: {critique-a-of-b} | ||
|
|
||
| ## Selection Criteria (in priority order) | ||
| 1. Correctness -- does it satisfy all success criteria? | ||
| 2. Test coverage -- are edge cases tested? | ||
| 3. Code quality -- readability, maintainability, consistency with codebase | ||
| 4. Simplicity -- prefer fewer abstractions when correctness is equal | ||
|
|
||
| Output exactly: WINNER: competitor-{a|b|c} | ||
| Followed by a justification paragraph. | ||
| " | ||
| ) | ||
| ``` |
There was a problem hiding this comment.
Step 2c says “each reviews the others’ work”, but the example SendMessage exchange only covers competitor-a ↔ competitor-b. If competitor-c is spawned (critical tasks), it would receive no critique request and its feedback won’t be available to the verifier. Expand the debate phase to include competitor-c (e.g., round-robin critiques) or explicitly state that Tier 2 debate is only for 2 competitors.
| SendMessage({ | |
| type: "message", | |
| recipient: "competitor-b", | |
| content: "Review competitor-a's implementation. Identify weaknesses, edge cases missed, and potential issues. Be adversarial -- find real problems, not style preferences. Report: (1) correctness issues, (2) missing edge cases, (3) maintainability concerns.", | |
| summary: "Requesting adversarial review of competitor-a's work" | |
| }) | |
| SendMessage({ | |
| type: "message", | |
| recipient: "competitor-a", | |
| content: "Review competitor-b's implementation. Identify weaknesses, edge cases missed, and potential issues. Be adversarial -- find real problems, not style preferences. Report: (1) correctness issues, (2) missing edge cases, (3) maintainability concerns.", | |
| summary: "Requesting adversarial review of competitor-b's work" | |
| }) | |
| ``` | |
| Each teammate responds with a structured critique. This fights LLM anchoring bias -- the first plausible answer does not automatically win. | |
| **Step 2d -- Verifier selects winner:** | |
| Spawn a fresh verifier agent (NOT a team member) to evaluate both implementations and both critiques: | |
| ``` | |
| Agent( | |
| subagent_type: "verifier", | |
| model: "{verifier_model}", | |
| prompt: " | |
| You are judging a competitive implementation. Two (or three) agents each implemented the same task independently, then reviewed each other's work adversarially. | |
| ## Implementations | |
| - competitor-a (CONSERVATIVE): {summary or path to worktree-a} | |
| - competitor-b (INNOVATIVE): {summary or path to worktree-b} | |
| ## Critiques | |
| - competitor-b's critique of competitor-a: {critique-b-of-a} | |
| - competitor-a's critique of competitor-b: {critique-a-of-b} | |
| ## Selection Criteria (in priority order) | |
| 1. Correctness -- does it satisfy all success criteria? | |
| 2. Test coverage -- are edge cases tested? | |
| 3. Code quality -- readability, maintainability, consistency with codebase | |
| 4. Simplicity -- prefer fewer abstractions when correctness is equal | |
| Output exactly: WINNER: competitor-{a|b|c} | |
| Followed by a justification paragraph. | |
| " | |
| ) | |
| ``` |
| Output exactly: WINNER: competitor-{a|b|c} | ||
| Followed by a justification paragraph. |
There was a problem hiding this comment.
The verifier prompt instructs “Output exactly: WINNER: competitor-{a|b|c}” and also asks for a justification paragraph. “Output exactly” is ambiguous/contradictory here and can cause downstream parsers to fail if they expect a single-line output. Clarify that the first line must be exactly the WINNER line, followed by free-form justification.
| Output exactly: WINNER: competitor-{a|b|c} | |
| Followed by a justification paragraph. | |
| On the first line, output exactly one of: WINNER: competitor-a, WINNER: competitor-b, or WINNER: competitor-c | |
| After that first line, output a justification paragraph explaining your choice. |
| subagent_type: "verifier", | ||
| model: "{verifier_model}", |
There was a problem hiding this comment.
This section mixes tool-call argument styles (earlier in the file Agent(...) is shown with subagent_type="executor" and run_in_background=true, but this new snippet uses subagent_type: "verifier", model: "...", commas, etc.). Since this is meant to be executable syntax, please standardize on one argument format within execute.md to avoid copy/paste errors.
| subagent_type: "verifier", | |
| model: "{verifier_model}", | |
| subagent_type="verifier", | |
| model="{verifier_model}", |
| TeamCreate(team_name: "probe-{timestamp}", description: "availability check") | ||
| # If probe fails, set TIER=1 and log reason |
There was a problem hiding this comment.
The Tier 2 activation “probe TeamCreate (lightweight — create and immediately clean up)” example never shows the cleanup step. As written it will leave probe teams under ~/.claude/teams//~/.claude/tasks/ on every run. Add an explicit TeamDelete step (or a deterministic probe name + delete) so the probe is actually lightweight/idempotent.
| TeamCreate(team_name: "probe-{timestamp}", description: "availability check") | |
| # If probe fails, set TIER=1 and log reason | |
| probe_name = "probe-tier2-activation" | |
| TeamCreate(team_name: probe_name, description: "availability check") | |
| TeamDelete(team_name: probe_name) | |
| # If probe fails at any step, set TIER=1 and log reason |
| // (Optional -- critical tasks only) Teammate C -- defensive approach | ||
| Spawn teammate "competitor-c" with prompt: | ||
| "Implement {task_description} using approach: DEFENSIVE. | ||
| Maximize error handling, edge case coverage, and robustness. | ||
| Work in isolation until the review phase. | ||
| Phase: {N}, Plan: {id}, Issue: #{phase_issue_number}. | ||
| Success criteria: {criteria from plan}. | ||
| When done, commit your work and report RESULT: PASS or RESULT: FAIL." | ||
| Model: {executor_model} | ||
| ``` | ||
|
|
||
| **Step 3 -- Adversarial critique via SendMessage:** | ||
| After all teammates complete, each reviews the others' implementations: | ||
|
|
||
| ``` | ||
| SendMessage({ | ||
| type: "message", | ||
| recipient: "competitor-b", | ||
| content: "Review competitor-a's implementation. Be adversarial: (1) correctness issues, (2) missing edge cases, (3) maintainability concerns. Find real problems, not style preferences.", | ||
| summary: "Requesting adversarial review of competitor-a" | ||
| }) | ||
|
|
||
| SendMessage({ | ||
| type: "message", | ||
| recipient: "competitor-a", | ||
| content: "Review competitor-b's implementation. Be adversarial: (1) correctness issues, (2) missing edge cases, (3) maintainability concerns. Find real problems, not style preferences.", | ||
| summary: "Requesting adversarial review of competitor-b" | ||
| }) | ||
| ``` |
There was a problem hiding this comment.
Pattern 1 allows spawning an optional competitor-c for critical tasks, but the SendMessage critique examples only request reviews between competitor-a and competitor-b. If competitor-c participates, include critique exchanges involving competitor-c (and pass those critiques into the verifier prompt), or state that the debate pattern is strictly 2-way.
| ``` | ||
| SendMessage({ | ||
| type: "message", | ||
| recipient: "competitor-b", | ||
| content: "Review competitor-a's implementation. Be adversarial: (1) correctness issues, (2) missing edge cases, (3) maintainability concerns. Find real problems, not style preferences.", | ||
| summary: "Requesting adversarial review of competitor-a" | ||
| }) |
There was a problem hiding this comment.
These Tier 2 patterns use the SendMessage({ type, recipient, content, summary }) schema, but docs/spec/agent-teams-research.md documents a newer v2.1.75+ schema using to/message/summary and calls out a breaking change. To avoid shipping “executable” examples that may be wrong depending on runtime version, please reconcile the repo docs (pick one schema + version guard, or note both with guidance on which to use).
Summary
execute.mdsection 6.3 with concreteTeamCreate/SendMessagecall syntax for the competitive implementation debate patternmaxsim-batch/SKILL.md: competitive implementation (debate), multi-reviewer code review (cross-checking), and collaborative debugging (adversarial hypothesis testing)Addresses PROJECT.md §7.2 audit gap (Parallelism PARTIAL 1-3).
Test plan
agent-teams-guide.mdAPI reference (SendMessage parameters: type, recipient, content, summary)🤖 Generated with Claude Code